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A method of syD£hesi2jzig of creaky voice 



The present inventioii relates to the field of synthesizing ofspe&cb^ and more 
particularly ivithout Umitadoti, to tibie field of text-to-speech synthesis. 

The function, of a text-4o-speedh (ITS) syathesis ^tem is to synQiesi^e 
speech fiom a generic text in. a given laiigoage. Nowadays;^ TTS systems have been put into 
5 practical operation fbr many applications^ such as access to databases through the telephone 
network or aid to handicapped people^Qne method to synthesize speech is by concatenating 
elements of a recorded set of subnnits of speech such as demisyllables or polyphones. The 
m^'ority of successful cwnmercial systems employ ihe concatenation of polyphones, 

Thepolyphones comprise groups of two (diphones), three (triphones) or more 
. 1 0 phones and may be detemiined from nonsense words, by segmenting the desired grouping of 
phones at stable spectral regions. In a concatenation based synthesis^ the conversation of the 
transition between two adjacent phones is crucial to assure ttie quality of the synthesized 
speech. With the choice of polyphones as the basic subunits^ the transition between two 
adjacent phones is pres^ved in the recorded subunits, and the concatenation is carried out 
15 between similar phones. 

Before the synthesis, however, the phones must have their duration and pitch 
modified in order to fulfil the prosodic constraints of the new words containing those phones. ' 
This processing is necessary to avoid production of a monotonous sounding synthesized 
speech, hi a TTS syst^n» this function is performed by a prosodic module. To allow the 
20 duration and pitch, modifications in the recorded subunits, many concatenation based TTS 
systems employ the time-domain pitch'-synchronons overlap^add (TD-PSOXA) (B. Moulines 
and F« Chaipmtier, *T^itch synchronous waveform processing techniques for text^to-speech 
synthesis usmg diphones/' Speech Comniun., vol, 9, pp. 4S3-467, 1990) model of synthesis* 
When a signal is to be synthesized vnXh an hicreased duration by means of a 
25 known PSOLA method, ea6h of the pitch bells is repeated a number of times conesponding 
to the desired mcrea$e of the duration. For example, if the duration is to be doubled each 
pedod of the original signal is repeated. Wh^ this approach is applied to creaky" voice, the 
resulting synthesized signal sounds unnatural and the creaky diaracter of the voice is lost. 
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Hie present invention theiefore aims to provide an hnpzoved method of 
^thesizing a signal which enables to synthesize creaky voice. Further the present inventLon 
aims to provide a coiresponding computer program product and compute syst^, in 
partLcidar, a text-to-speech system. 
5 The present invention provides for a method of synihesisdng a signal having 

alternating strong and weak periods as it is the case for crealc/ voice. 

Creaky voice is often foxmd. afffie'Scar&fa'sente^^ 
speakCT is at its low end. Creaky voice is characterized by Irregalarity of pltch-pOTod 
durations. One common version of creaky voice has altemaJrag strong and weak periods. The 

1 0 present invention is based on the discovery fliat by application of a prior art PSOLA-lype 
method for synthesizing a signal havmg an increased duration the alternation of the strong 
and weak periods is lost and that therefore an mmatnral sounding amplitode variation is 
added to the synthesized speech- The invention enables to preserve such a creaky voice 
characteristic in the synthesized signal. 

15 ]ji accordance with a preferred embodiment of the invention the strong and the 

weak periods of an orighial creaky voice sound signal are classified by marking titie periods 
with different class-types. This information is used to make an alternating choice between the 
strong and the weak periods. By cihoosing nearest neighboring periods for the selection of 
pitch bells also the form of the signal envelope is preserved in the synthesized signal having 

20 the increased duration. 

The present invention is particularly advantageous for text-to-speech synthesis 
systems. Li accordance with a preferred embodiment of the invention such a text-to-speech 
synttiesis system contains a data file ibr storing classification information of the original 
sound signal. By means of this classification information creaky voice intervals having 

25 altetoaring strong and wealc periods are identified. 

This classificaticn information can be generated by means of a computer 
program, which analyses the original signal in order to detect the characteristics of creaky 
voice within ^e signal. Alternatively this classification can be performed by a human expert. 
It is to be noted that the classification is only to be performed once; after the initial 

30 classification an unlimited number of signals of a variety of durations can be synthesized 
without fiuther interactioiL 
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]h tiie foUowing preferred embodimaits of the inventioii are described in 
greater detail by making reference to the drawings in whu&- 

Kg, X is iUnstrative of a soimd signgl containing creaky voice and a 
synthesized signal having an increased dtiratian, 
5 Hg. 2 is a flow chart of an embodiment of a method of the invention, and 

Fig. 3 is a block diagram of a preferred embodiment of a computer system. 



Kg. 1 shows an original sigoal 100 havhig a duration of 0.07 seconds. The 
10 periods of the original signal 100 are classified as V. V or 'o': The classifier 'v' identifies 
periods of type Voiced'; flie classifiers *e* and 'o* identify periods which are of type 
*creaky% whereby *e' designates strongperiods and 'o» designates weafc periods. In tliis 
context -weak' means that the amplitude within that period of the creaky voice interval is 
lower than the amplitndeofthe immediately preceding period; likewise 'strong' means that 
15 the amplitude of that period of the creaky voice sound is higher than the amplitude of the 
immediately preceding period of flie creaky voice sound interval. Tbis dassification of the 
oiigmal signal 100 can be performed by means of a con^nter program which analyses the 
original signal 100 in order to identifyfibie above described signal characteristics. 
Alternatively this claasifioation can also be performed manually by a human expert. It is 
20 preferred that the classification is performed in a first step by means of a computer program 
and is tbeu reviewed in a second step by a human expert for improved preoifiion of the 
olassificartion. Ctiginal signal 100 and its classification serines as a basi^ 
synthesized signal 102. The synthesized signal 102 is required to have a duration of about 
0.16 seconds which is about twice the duration of the origtaal signal 100. In order to 
25 synthesize the signal 102 with this required duration pitch bell locations j are detarmmed on 
the time axis 104 m the domain of the synthesized signal 102. The pitch beU locations jar© 
distanced on the time axis 104 by the period p as given by the fundaniental ftequency of the 
signal to be synthesized. It is to be noted that the signal to be synthesized can have the same 
or another pitch/fundamental frequency as the orighial signal. The first required pitoh bell 
30 locationj^-lisoftype V 95 it is the case for the first period el of the creaky voice sound 
interval within the original signal 100. As aconsequence apitchbell is obtained from the 
period el of the original signal 100 by means of windowing. The following required pitoh 
bell location j = 2 requires a pitch bfiU of type 'o* as the synthesis of creaky voice requires 
alternating strong and weak periods. In onJer to also maintain the fijrm of the signal envelope 
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^sithin the <ac«aky voice soimd period witfa^ signal 100 a pitcih bell is obtained ftom 

the nearest neighboring pmod of type *o» ^wthin the original signal 100. which is period oL 
The following required pitch beU location j = 3 again requires a pitdi beU of typo 'e'. This 
pitch beU ig obtained from aperiod that is categorized as 'e' within Hie original signal 100 
5 which is the nearest nei^bor to the required pitch bell location j = 3. This nearest ndghbor is 
the period el within oriffnal signal 100. This means that a pitchbeU is obtained for pitch, bell 
location j 3 by windowing period el of Ihe original signalToU; 

Likewise Hie consecutive pitch bell location j = 4 needs to be of type 'o'. 
Again the closest period of that type within original signal 1 00 is selected in order to obtain a 
10 pitch bell. This closestperiod of the required type is the period ol. This process is performed 
with respect to all required pitchbell locations j on time axis 100 in order to obtain apitch 
bell &r each of the required pitch belllocations. 

The resulting pitch bells are then overl^ped and added in order to synthesize 
the required signal 102 containing synthesized creaky voice with an increased duration. The 
15 resulting syrtbesized signal 102 has a sequence of alternating strong and weak periods as it is 
flie case in the original signal 100 in order to maintain this aspect of the original signal 
characteristio. Becanse of the fact that always nearest neighboring periods of the required 
categoiy are selected fiom the original signal 1 00 for obtaining the pitch bells also the form 
of flie signal envelope of the creaky part of the original signal 100 is preserved, Tlie result is 
20 a natural sounding synfeesized signal 102 having all of flie characteristics of the origmal 
creaky voice sound sign^ but with an increased duration. 

Fig. 2 diows acoiTOsponding flow chart. In step 200 an original sound signal 
is provided, -nie original sound signal contains at least one interval containing creaky voice. 
Ih step 202 creaky voice sound periods are identified and classified This can be done 
25 mamwlly, by means of a conq»nta> program or with the assistance of a computer program. To 
xetain the natorahiess of &e creak, the strong and weak periods are marked with different 
class-types and AOs inforaiation is used to make an alternating choice between the strong and 
weak periods. Strong (even) periods are markedby type *V and weak (odd) periods are 
maikedby type '-l'. fix step 204 pitch bells are obtained fiom the original sound signal by 
30 means of windowing. The windowing operation is per&imed by means of windows which 
are positioned syn<ihronouBly with the fundamental fieqoency of the original sound. In step 
206 the required pitdibeU looaticms j in fee time domain of the signal to be synthesized are 
determined. If the signal to be synthesized is required to have a certain duration this implies 
thatanumbtsrof xpitchbell locations wMoh are spaced ^art by the period? are required 
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where the number x greater to the number of periods coDtafeed in (he original signal. In 

Step 208 the index j is initializedto be e«iual to 1. Li step 210 the 

equal to 1. The index t indicates the type i«*ich is either T or '-lMiistq> 212 apitchbell is 
selected for the pitch beU location j in the time domahj of the signal to be synthesized. Hiis 
5 selectionisperftimedby gearohingforthenearestnei^borofpitchbelllocationj inthe 
time domain of the original signal which has the required type t. This way a pitch bell of type 
t is selected from the nearest neighbor of pitch bell locatbn j in the time domain of the 
original signaL In step 214 the index j is incremented in cider to go to the next pitch beE 
location j. In st^ 216 the type parameter t is multipUed by-1 m oider to change the required 
tJTPo to the category 'wealC. As a consequence in the foUowing step 212 a nearest neighbor 
for 4e Qonseontfva pitch bell location j which is of type '-1 ' is selected from the domain of 
flie origfaial SignaL Steps 212, 214 and 216 are repeatedly carried out itotU pitch bells have 
been selected for afl of the required pitch beU locations j. After this selection process has 
been completed an overly and add operation is perfimned; the resultmg signal contains 
15 creaky voice and has the required duration. 

Fig. 3 shows a block diagram of a computer system 300, such as a text-to- 
speech system. The computer system 300 has amodule 302 for storing of arecotdmg of an 
origmal sound signal comprising a creaky voice sound interval. Module 304 serves to store 
sound classification Infonnation, i.e. storing of classifiers 'v', 'e' and 'o' as it is illusHajed in 
20 the example of iignre 1. Module 306 serves for windowing of the original sound signal in. 
order to obtain pitch bells. Module 308 serves to detetmine the required pitch beU locations 
in the domain of the signal to be synthesized. This is done based oil the requixed length y of 
flie signal to be synthesized, the required fundamental frequency of the signal to be 
synthesized, which may ormay not be equal to fimdamental frequency of the original sound 
25 signal. Module 310 serves for selection of pitch bells which are obtained fiom module 306. 
The pitch bells are selected in accordance with st<?>s 212, 214 and 216 as iUostrated in Fig. 2. 
This means that creaky voice is obtained by creatnig a sequence of alteanatmg strong and 
weak periods while preserving flie fi»im of fte signal envelope of the original sound. Module 
312 serves to perform an overi^ and add operation on the pitch bella selected bynM 
30 310. This way the required synlhesized signal is obtauied. 
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CLAIMS: 



1 ^ A method of synihesiziiig a sigaal comprising the st^s of: 

_^ pmvifHii g of a fifst sispal havins first periods of a first type and second 



periods of a second tj/pe in an alternating sequence, 

b) windowing of the first signal to provide a pitch beU for each of the fist and 

S second periods, 

deteradning anninber of required pitch bell locations fer a second signal to bo 



d) selecting of one of the pitch bells fiw a first one of the required pitch bell 
locations by idemtii^g the nearest neaghboring period of the first one of the required pitdi 

10 beU locations being of the first type, and seaecting of the pitchbeU of the identified period, 

e) selecting of one of the pitch bells fijr a second one of the required pitch beU 
locations by identiedng a nearest neighboring period of the second one of the required pitch 
bell locations having the second typo, and selecting the pitch beU of the identified period, 

whereby the steps d) and e) are carried out &»: an of the required pitch bell 

15 locafioni^ 

f) perfbiKdngaaoverlapandaddoperatlonontheselectedidtchbellsmotderto 
synthesize the second sigtud. 

2 The meflwd of elainil, the first signal having aKeniatingstroiig and we^ 

20 periods ofsubstandaUythe same signal fiwm. 

3. The method of claims 1 or 2, the first signal bong a creal^ voice sigaal. 

4. The method of claims 1, 2 or 3, whereby the required pitch bell locations are 
25 determined in order to increase the duration of the second sigaal to be synthesized. 



5 A computer program product, in particular digital storage medimn, compxismg 

program means for performing the steps of: 
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a) providing of a furat signal having first periods of a first type aod second 
periods of a second in an alternating sequence, 

b) windowing of the first signal to provide a pitch bell for caoh of the fist and 
second periods, 

5 c) detennining a numher of required pitch bell locations for a second signal to be 

synthesized, 

d) selecting of one of the pitch bells fox a first one of the required pitch bell 

locations by identifying the nearest neighboring period of the first one of the required pitch 
bell locations being of the first type, and selecting of the pitch bell of the identified period, 
10 e) selecting of one of the pitch bells for a second one of the required pitch bell 

locations by identifjang a nearest neighboring period of fbe second one of the required pitch 
bell locations having the second type, and selecting tiie pitch bell of the identified period^ 
whereby the steps d) and e) axe carried out for all of Ihe required pitch bell 

locations^ 

15 f) performing an overlap and add operation on the selected pitch bells in order to 

synthesize the second signaL 

6. The computer program product of claim 5 the program means being adapted to 
determine the required pitch beU locations in accordance with a required duration of the 

20 second signal to be synthesized. 

7. A computer system^ In particular text-to-^eech synthesis system^ oomprising: 
means fbr iux>viding of a first signal having first periods of a first and 

second periods of a second type in an alternating sequence^ 

25 - means &r windowing of the first signal to provide a pitch bell fi>r each of the 

fist and second periodjs, 

mearis &r detennining a number of required pitch bell locations fbr a second 
signal to be synthesized, 

means fi>r selecting of one of the pitch bells for a first one of the required pitch 

30 bell locations by identifying the nearest neighboring period of the first one of the required 
pitch bell locations bemg of the first typ^ and selecting of the pitch beU of the identified 
period, and for selecting of one of the pitch bells for a second one of the required pitch bell 
locations by identlQdng a nearest nei^bozing period of the second one of the required pitch 
bell locations having the second ^e^ and selecting the pitch beU of tibe id^itified period. 
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jneans far perfomiing an overly and add operatiott cna flie selected pltdh bells 
in order to synSb&sizis the second signal 

8. The computer system of claim 7 fartiher comprising means for stxmng of 

5 classification data for identi^ng flwt and second periods of the first signal. 



J A syntiiesized signal comprising a number of pitcffSellswffiabi are overlappea 

and added, the pitch hells being of first and second types, the first and second types having 
substantially the same signal form and varying amplitudes, the pitch bdls being selected to 
10 form an alternating sequence of first and second type pitch bdOs. 
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ABSTRACT: 



The inv^tUoi relates to a method of sypfbesiziog a signal compnsittg the steps 

of: 

a) providing ofa&st signal having fim periods of a £tot^ea^ 

periods of a second type ia an altemadng sequence^ 

S b) selecting of one of tiiepitcb bells fbr a first one ofthexegijired pitch bell 

locations by ideati^^ the nearest neigjiboring period of the first one of llxe required pitch 
bell locations bdng office first type, and sdeoting of the pitch ben of flie identified padod, 
c) eeleotiiig of one of the pitdi bells £ot a second one of tiie required pitch bell 

locations by identiftdng a nearest neighboring period of the second one of the required pitch 
10 bell locations having the second type, and selectlagihe pitch bdl of the identified period, 
wheaceby flie steps b) and o) are carried out for all of the required pitch bell 

locations. 
Big. 2 
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